home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
AmigActive 10
/
AACD 10.iso
/
AACD
/
Magazine
/
Online
/
tcpdl
/
tcpdl.doc
< prev
next >
Wrap
Text File
|
1998-11-08
|
34KB
|
977 lines
tcpdl V2.5 distribution October 1998
CONTENTS
========
This archive consists of the following files:
tcpdl 2.5a the tcpdl http download program
tcpdlpp 1.5 the post processor for downloaded pages
tcpdl.doc this document
tcpdl_html a subdirectory containing a version of this document
in html.
urllist example urllist file
tcpdl.config example tcpdl.config file
tcpdlpp.config example tcpdlpp.config file
INTRODUCTION
============
tcpdl is a program to download files from http hosts. It can follow
links and can therefore be used to download whole web sites.
tcpdlpp is a program to post process files that have been downloaded
by tcpdl. This amends the URLs within each html file so that the
downloaded pages may be used locally, while any URLs that have not
been downloaded refer to their full URL.
COPYRIGHT
=========
tcpdl is Copyright © 1996 Patrik Nilsson
Copyright © 1997, 1998 Ramjam Consultants Ltd
tcpdlpp is Copyright © 1997, 1998 Ramjam Consultants Ltd
This archive is freely distributable, but may not be included in any
commercial software collection other than Aminet without prior permission.
REQUIREMENTS
============
To use tcpdl you need a TCP/IP stack and a connection to a http host.
Both tcpdl and tcpdlpp should work on any Amiga system which supports
a TCP stack.
In use tcpdl typically requires approx 1Mb of RAM plus 500 bytes for
each file to be downloaded, and tcpdlpp requires about 100K plus
100 bytes per file to be processed.
Disk space equivalent to about twice the size of the files to be
downloaded is also required (in worst case situations).
SUPPORT
=======
Please report any problems, and send any suggestions or comments to:
support@ramjam.u-net.com
The most recent version will always be available from
http://www.ramjam.u-net.com/
Major versions may also be available from Aminet.
There is no requirement to register use of this program, but anyone
doing so will be kept informed of updates.
Both programs have been optimised for the 68000 processor in order to
be as generic as possible.
KNOWN PROBLEMS
==============
tcpdl:
- doesn't resolve links from HTML with a <BASE> specification correctly
(a fix is planned for a future version)
- doesn't allow access to sites which require a userid/password.
(enhancement planned for a future version)
- doesn't support the automatic download of URLs referenced by the
name attribute of the object tag. This is because the name attribute
does not always apply to a URL. Since the object tag is only supported
by IE3, and the name attribute refers to a URL that is likely to be
applicable only to IE3, this is not seen as a significant restriction.
(no fix planned)
tcpdlpp:
- doesn't handle links from HTML with a BASE specification correctly
(a fix is planned for version 1.6)
DISCLAIMER
==========
A great deal of effort has gone into making these programs as reliable
as possible. However, there is no guarantee that they will perform as
described in all cases.
These programs are used entirely at the user's own risk. No liability
can be accepted for loss of data resulting from the use of these
programs.
AKNOWLEDGEMENTS
===============
Thanks to Patrik Nilsson for the original version of tcpdl.
Thanks to David Stroud for producing an html version of this document.
Thanks also to all the users of tcpdl for their comments and suggestions.
CHANGES FROM PREVIOUS RELEASE
=============================
tcpdl
-----
V2.5 -> V2.5a:
- Fixed a problem where the leading enclosing quotes around URLs
in downloaded HTML files were omitted.
V2.4a -> V2.5:
- Added a NOWAIT option to allow tcpdl to exit without waiting for
return to be pressed. This option can be used from the command
line, or can be specified in the config file.
- Added a size gadget to the status window which allows the width
of the window to be adjusted (the height is fixed by the number
of tasks specified).
- Added a PUBSCREEN option to allow tcpdl to open its status window
on a specified public screen. If no public screen is specified,
then the Workbench screen will be used by default as in previous
versions. This option can be used from the command line, or can
be specified in the config file.
- Added a FONT option to allow the font to be used by the tcpdl status
window to be specified. By default the Xen font will be used,
or the default font if Xen is not found. This option can be used
from the command line, or can be specified in the config file.
- Added a FONTSIZE option to allow the size of the font to be used
by the tcpdl status window to be specified. By default a size of
9 is used. This option can be used from the command line, or can
be specified in the config file.
- Added a PRIORITY option to allow the maximum priority of the
tcpdl tasks to be specified. By default a priority of 2 is used.
This option can be used from the command line, or can be specified
in the config file.
V2.4 -> V2.4a:
- fixed a bug that caused non-text files to be downloaded even
when the previously downloaded version was still current.
This wasted bandwidth unnecessarily. Thanks to Jon Wareham
for reporting this problem.
V2.3c -> V2.4:
- reworked the code that deals with the http transfer. This should
make the code more tolerant of strange replies from the server,
and has paved the way for further improvements in the future.
- changed behaviour when URLs containing characters that are invalid
in AmigaDOS file names are encountered. Any invalid characters are
now encoded as %xx where xx is the hex value of that character.
This is transparent to web browsers and servers, as this form of
encoding is standard. The encoded filename will be used in HTML
references, as well as in the file name itself.
Previously, a CRC encoding was added to the file name, making it
hard to determine what the original file name was. This ensured it
was unique, but introduced its own problems.
- changed behaviour when filenames are longer than allowed by the
filesystem being used. Filenames are now simply truncated at
the maximum supported length (30 characters for FFS). The full
URL without truncation will appear in HTML references.
Previously, the filename was truncated and a CRC encoding was
added. The new approach could lead to filename clashes (where
more than one file in a directory has the same name) but this
is expected to cause fewer problems than the CRC encoding.
Note that the encoding of characters (described above) that are
invalid in filenames may cause filenames to exceed the length
limit and be truncated.
- added the NOSAVE command line option. When specified this
prevents any downloaded files from being saved.
This feature has been requested by users that use tcpdl to
prime local proxy servers (e.g. httpproxy) with files. If
tcpdl also saved the files, there would be 2 copies stored
locally.
- added the DEBUG command line option. When specified this
stores a copy of the HTTP request in each DATA file, and
leaves data files from failed transfers in the TEMP directory.
This is intended for use in investigating problems only,
and isn't intended ffor general use.
- added new status messages
"*BREAK*" indicates that the task has noticed the ctrl-C
"Not Found" indicates that the URL was not found on the server
"SRVR ERROR" indicates that the server reported an error
- previously, some temporary file were written to t:, which could
cause problems in low memory situations. All files are now
written to tcpdldir: so the user has control of the location
of all files generated (even temporarily) by tcpdl.
- corrected display of current and total file sizes. Previously
these were displayed only intermittently.
- corrected minor bug in the status display which caused columns
not to be correctly aligned
- reduced memory usage slightly
- corrected a bug that could leave some files locked if an error
occured during a download
V2.3b -> V2.3c:
- added URL command line option. This allows a URL to be downloaded to
be specified on the command line rather than in the urllist file.
See the Command Line Options section below for details.
- added CONFIG command line option. This can be used to specify the
configuration file to be used. If it is not specified, the default
tcpdldir:tcpdl.config is used as previously.
- added TASKS command line option to allow the number of tasks to be
used in downloading URLs to be specified. Valid range is 1 to 15.
- added TASKS config option. If present, this value acts as the default
number of tasks to be used in downloading URLs. Valid range is 1 to 15.
This value can be overridden by use of the command line TASKS option.
- added USERAGENT config option. If present, this specifies the user
agent name to be sent to the HTTP server. By default the user agent
name used will be of the form "tcpdl/<version>", where <version>
is the current version number. This can be used to spoof as some
other HTTP client, for (broken) sites which only accept requests from
certain browsers.
- reduced the default number of processes to 12, as 15 was too much
for many machines.
- changed command line option handling to use AmigaDOS templates.
V2.3a -> V2.3b:
- tcpdl sent http commands that some hosts didn't like - it should now
manage to talk to hosts using any version of HTTP.
- The UPDATE flag wasn't correctly propogated to all the URLs referenced
by the files marked with UPDATE in the urllist.
This was a fundamental design flaw, and so the UPDATE flag has been
removed and replaced by an UPDATE command line option which applies
to all URLs in urllist.
- The IMG flag didn't quite do what was advertised, so it has been
amended slightly:
TEXT means all URLs that are referenced by a "href" attribute
IMG means all URLs that are referenced by an attribute other than "href"
ALL means all URLs that are referenced
- added a configuration option USER, which allows the mail address to be
used in the HTTP commands to be specified. By default tcpdl uses the
username and host name used by the tcp/ip stack. If the USER option
is specified with no mail address, then no From: header will be
sent in HTTP commands.
- tcpdl used to send the user's realname to the http host - there is no
need to do this, so now it doesn't.
- fixed the DISK-ERR message, which looked untidy.
V2.3 -> V2.3a:
- made it possible to exit tcpdl when run from Workbench
- fixed enforcer hits when urllist contains a blank line
V2.2 -> V2.3:
- added support for proxy servers
- added configuration options for connection, and htpp transfer timeouts
- added configuration option for the number of retries for failed
transfers
- tcpdl now uses memory pools for its memory allocation. This improves
the allocation times slightly, but has a dramatic effect on the
deallocation time. It also reduces the risk of memory "leakage" when
tcpdl is interrupted.
- the User Agent name has been changed to conform to RFC2068
- the tcpdl.config file is now closed as soon as it has been read
- the environment variable USERNAME is now used if USER is not set
- changed the ERROR status indicator to be more specific about the
cause of the error
V2.1 -> V2.2:
- added tcpdl.config to allow the specification of file types that are
not to be downloaded.
V2.0 -> V2.1:
- fixed problem where URLs in the urllist file which were not followed
by at least one space or tab character were ignored.
tcpdlpp
-------
V1.3 -> V1.4:
- changed the handling of URLs with a leading '/'. All URLs are now
converted to relative form, so all local links should work when
browsing off-line.
V1.2 -> V1.3:
- minor optimizations
V1.1 -> V1.2:
- improved the handling of ".." in URLs
- added optional translation of characters in URLs, by means of a new
configuration file "tcpdldir:tcpdlpp.config"
USING TCPDL AND TCPDLPP
======================
OVERVIEW:
---------
Both tcpdl and tcpdlpp expect the assign tcpdldir: to refer to a directory.
This directory is the work area for both programs.
By default, the urllist file, containing the list of URLs to be downloaded
is expected to be in this directory. The optional configuration files,
tcpdl.config and tcpdlpp.config are also expected to be in this directory.
When tcpdl downloads URLs it will create three directories below tcpdldir:,
TEMP, DATA and HTTP. TEMP is as its name suggests used only for temporary
work files. Beneath DATA and HTTP one directory will be created for each
host, and beneath each of these will be the directories and files which are
downloaded.
The HTTP directory contains the actual files that are downloaded, while the
DATA directory contains files holding information about each file downloaded.
Example:
the following directory tree shows the structure that might result from use
of the example URLs given in the "THE TCPDLDIR:URLLIST FILE" section below.
tcpdldir:
|
|
+------- urllist
|
|
+------- TEMP
| |
| |
| ...
|
|
+------- HTTP
| |
| |
| +------- www.ramjam.u-net.com
| | |
| | |
| | +------- index.html
| | |
| | |
| | +------- amiga
| | | |
| | | |
| | ... ...
| |
| |
| +------- www.amiga.com
| | |
| | |
| | +------- index.html
| | |
| | |
| ... ...
|
|
+------- DATA
|
|
+------- www.ramjam.u-net.com
| |
... ...
Thus, once a file has been downloaded, it appears within a directory that
identifies the host from which it came.
The DATA directory is used by tcpdl during the download and update process.
The final files appear in the HTTP directory. The DATA/HTTP directory
tree will mirror the actual HTTP directory tree, but the files contain
the HTTP response from the server which includes information that
tcpdl uses. This information includes the date and time of download
(used when performing an UPDATE), and for html files, a list of all
the URLs that are referenced.
The TEMP directory holds temporary files during processing by tcpdl.
Unless the DEBUG option is specified, all temporary files should be
deleted by tcpdl upon exit. Any files that are left behind in this
directory may be safely deleted.
The HTML files downloaded by tcpdl will have references to URLs replaced by
a reference to a local file, e.g.
http://www.ramjam.u-net.com/home.html
will become
file://localhost/tcpdldir:http/www.ramjam.u-net.com/home.html
The post processor, tcpdlpp, processes all files within the tcpdldir:http
directory and converts references to other files that are present in this
file hierarchy to relative URLs, and converts all other references back to
absolute URLs.
This allows the files within the tcpdldir:http directory to be browsed
offline, while allowing links to other URLs to be followed if the user
happens to be online. By downloading your favourite pages, you can browse
the web much faster, while still being able to follow links to other sites.
The downloaded pages may be updated periodically using tcpdl with the
UPDATE option, and then running tcpdlpp again to adjust any amended
references.
TO START:
---------
1. assign "tcpdldir:" to the directory which is to contain the
downloaded files (this directory should exist).
(e.g. "assign tcpdldir: work:tcpdldir")
2. Edit the file tcpdldir:urllist such that it contains the files
to be downloaded.
3. edit tcpdldir:tcpdl.config as required
4. edit tcpdldir:tcpdlpp.config as required
5. check that there is enough disk space for the pages you intend
to download
6. connect to the Internet
7. run "tcpdl" from a shell
8. if required, run "tcpdlpp" from a shell (this can be done offline).
If on checking the output of tcpdlpp there are a lot of references
to a non-local URL you may want to use tcpdl to download that URL.
After downloading it, re-run tcpdlpp to change all links to that
URL to refer to the local file.
COMMAND LINE OPTIONS:
---------------------
tcpdl accepts a number of command line options.
URL=<URL specification>
The URL specification can either be just a URL, or a URL with download
options as in the urllist file. If options are specified, then the
URL and options must all be enclosed within quotes.
e.g.
tcpdl url=http://www.ramjam.u-net.com/
tcpdl url="http//www.ramjam.u-net.com/ TEXT"
URLLIST=<file>
The URLLIST option specifies a file containing a list of URLs to be
downloaded. If this option is not specified then "tcpdldir:urllist"
is used by default. See the section "THE TCPDLDIR:URLLIST FILE" for
details of the format of this file.
CONFIG=<file>
The CONFIG specifies a file containing configuration options. If this
option is not specified then "tcpdldir:tcpdl.config" is used by
default. See the section "THE TCPDLDIR:tcpdl.config FILE" for details
of the format of this file.
TASKS=<number>
The TASKS option specifies how many URLs will be downloaded at once.
This overrides any TASKS value specified in tcpdl.config. The valid
range of values is 1 to 15.
UPDATE
The UPDATE option specifies that any file that has been downloaded
will be checked to see whether it has been updated since then. If
it has it will be downloaded again.
NOSAVE
The NOSAVE option specifies that the downloaded files should not be
saved. This may be useful if tcpdl is used to prime a local proxy
server, or in testing http servers.
NOWAIT
The NOWAIT option specifies that tcpdl won't wait for return to be
pressed before exiting. This makes it easier to use tcpdl from
within scripts.
DEBUG
The DEBUG option specifies that the files within the DATA hierarchy
should contain a copy of the HTTP request that was sent to the
server, as well as the response and other usual information. It
also disables the deletion of temporary files from the TEMP directory
for transfers that failed. This can be useful when investigating the
reason for failed transfers.
PUBSCREEN=<name>
The PUBSCREEN option specifies the name of the public screen that
the tcpdl window will be opened on. By default the Workbench screen
will be used.
FONT=<name>
The FONT option specifies the name of the font that the tcpdl window
should use. Note that the .font suffix should not be specified.
Note also that a monospaced font should be used (a proportional
font will cause the columns not to align correctly).
FONTSIZE=<number>
The FONTSIZE option specifies the size of the font the tcpdl window
should use. The size of the tcpdl status window will be adjusted
accordingly. By default a size of 9 is used - if you have a very
high screen resolution you may wish to increase the font size.
PRIORITY=<number>
The PRIORITY option specifies the maximum priority to be used by the
tcpdl tasks. Priorities must be in the range 0 to 5 inclusive. By
default a priority of 2 will be used. Using a value of 0 may be
useful if you want tcpdl to operate in the background while you
are using a browser or IRC client.
TCPDL TASK STATUS WINDOW:
-------------------------
The tcpdl status window is updated approx. once per second (Note that
not every change in status will have a chance to appear in the window).
tcpdl can download a number of files at once. There is one line in the
status window for each of these tasks. The fields on each line are
described below:
Status: one of the following values:
"Connecting" Trying to connect a host.
"Sending" Sending request.
"Header" Receiving header.
"Updating" Requesting using "If-Modified-Since" from
a host or loading data from tcpdldir:data/
"OK" File downloaded successfully
"Receiving" Receiving data.
"Wait. html" A limit of 512k html-data to process.
Processing will continue when the amount
outstanding falls below the limit.
"Proc. html" Processing html.
"Copying" Copying html-file to tcpdldir:http/
"Not Found" The server reported the URL not found
"LIB ERR" Unable to open bsdsocket.library
"HOST ERR" Unknown or unreachable host
"SOCK ERR" Unable to open socket
"CON ERR" Unable to connect to host
"HDR ERR" Failed to download header
"RECV ERR" Failed while receiving data
"FILE ERR" Failed to open output file
"SRVR ERR" Server reported an error
"DISK ERR" Failed while writing to output file
(most likely the disk is full)
"ERROR" Some other error occurred
"*BREAK*" The task has recognised a user break
or an error is causing an abort (e.g.
the disk is full)
Time: elapsed time since trying to connect
CPS: the current download rate achieved for this file
CSize: the current size of the data received
FSize: the final size of the data, if given by the server
Request: the URL requested
The top line of the status window also contains an overall progress indicator
(DONE:<n> TOTAL:<m>): <n> the number of files downloaded so far
<m> the number of files listed in memory
The bottom line of the status window gives some overall performance figures:
Total time: the elapsed time since tcpdl started execution
Total bytes: the total number of bytes downloaded so far
Average cps: the average number of characters per second downloaded
OTHER NOTES:
------------
Execution may be terminated using CTRL+C. The program exits as fast as
is possible safely. It can take a little while if it is processing large
htmlfiles at the time. As each task notices the CTRL-C, the status will
change to "*BREAK*".
If a particular host times out on more than 5 occasions, no further attempts
are made to download any files from that host. This avoids wasting time
attempting to connect to a host that is down.
THE FILE "TCPDLDIR:urllist":
----------------------------
The urllist file should contain one or more URLs which are to be downloaded.
Each URL should start on a new line, and be followed by the appropriate
flags, separated by spaces.
The supported flags are:
D1, D2, ..., D19 downloading n levels of text/html
DEFAULT 255 (!)
H0, H1, ..., H5 If another host is referenced by a HREF
max number of levels is set to n.
DEFAULT H0, current host only.
P0, P1, ..., P5 If path is other than given in your urllist
max number of levels is set to n.
DEFAULT Pn, where n is the same as for Dn
TEXT download files referenced by a "href"
attribute. These will commonly, but not
exclusively, be HTML files.
IMG download files referenced an attribute
other than "href". These are commonly,
but not exclusively, images.
ALL download all types of files. This is the
default if none of IMG, TEXT, ALL are specified.
(Note that this will not download files with
types that appear in IGNORE lines within
tcpdl.config)
e.g.
http://www.amiga.com/index.html D2 H3 TEXT
will download 2 levels of html files referenced by the specified file
from www.amig.com, and 3 levels of links to any other host.
http://www.ramjam.u-net.com/home.html D5 H0 ALL
will download 5 levels of files referenced by the specified file
from www.ramjam.u-net.com, but will not download any files that
are referenced on any other host.
http://www.ramjam.u-net.com/ TEXT
will download all text files referenced by the default home page
from the host www.ramjam.u-net.com.
THE FILE "TCPDLDIR:tcpdl.config":
---------------------------------
The tcpdl.config file is optional. If present, it will be read in and
the contents will be used in determining what file types will be
downloaded.
Any line with a hash ('#') in column 1 will be ignored as a comment.
White space is ignored, and the commands are not case-sensitive.
Currently the following configuration commands are suported:
IGNORE <suffix>
where <suffix> is a file suffix which should not be downloaded. Note
that such files mentioned explicitly in urllist will be downloaded,
but any such files referenced within html will not.
Note that <suffix> may contain any characters except white space,
but will only be matched against the end of a file name.
PROXY <proxyserver:port>
this specifies that all http requests shpould be sent via the
specified server. If the port number is omitted, then a default
of 8080 is used.
By specifying your ISP's proxy server you can improve download
speeds significantly - especially for busy sites.
A proxy will also be required for connections via a firewall.
CONTIMEOUT <seconds>
this specifies the initial timeout for each connection in seconds.
The default is 20. The timeout must be within the range 10 to 600.
HTTPTIMEOUT <seconds>
this specifies the timeout for each http request in seconds.
The default is 60. The timeout must be within the range 10 to 600.
RETRIES <number>
this specifies the number of attempts that will be made to download
each file. The default is 5. The value must be within the range
1 to 100.
USER <mail address>
the mail address is sent to the HTTP server as the address that
mail can be sent to if there are problems caused by tcpdl's requests.
If the USER option is specified without a mail address, then
HTTP requests will not include any mail address (this can help
maintain your anonymity). If the USER option is not present in
the config file, then your current user id and host name are
used.
TASKS <number>
the number of URLs that will be downloaded concurrently. If this
option is not specified then the default of 12 will be used.
The valid values are 1 to 15. This value may be overridden by
the TASKS command line option.
USERAGENT <string>
the string specifies the user agent name that will be sent to the
HTTP server. The string is assumed to start at the first non-blank
character after the USERAGENT keyowrd, and to run until the end
of the line - so it may contain spaces. This option allows tcpdl
to appear as if it is some other http client, which is necessary
to access some (broken) sites which only accept requests from
certain browsers.
NOWAIT
this specifies that tcpdl will not wait for return to be pressed
before it exits. This option is useful if tcpdl is run from within
a script.
PUBSCREEN <name>
this specifies the name of the public screen that the tcpdl status
window should use. This can be overridden by the PUBSCREEN command
line option.
FONT <name>
this specifies the name of the font that the tcpdl status window
should use. Note that the .font suffix should not be given, and
for best results only monospaced fonts should be used. This option
can be overridden by the FONT command line option.
FONTSIZE <value>
this specifies the size of the font that the tcpdl status window
should use. The height of the window will be adjusted appropriately.
By default a font size of 9 will be used. This option can be
overridden by the FONTSIZE command line option.
PRIORITY <number>
this specifies the maximum priority to be used by the tcpdl tasks.
Priorities must be in the range 0 to 5 inclusive. By default a
priority of 2 will be used. This option can be overridden by use
of the PRIORITY command line option. Using a value of 0 may be
useful if you want tcpdl to operate in the background while you
are using a browser or IRC client.
e.g. The following is an example of what may appear in tcpdl.config
#
# Specify the Demon Internet proxy server
#
PROXY www-cache.demon.co.uk:8080
#
# Specify the timeouts - small since we're using a proxy
#
CONTIMEOUT 10
HTTPTIMEOUT 30
#
# Specify the number of attempts for each file
#
RETRIES 2
#
# Specify that no mail address is to be sent to the server
#
USER
#
# Specify the number of URLs to be downloaded concurrently.
# This can be overridden by the TASKS command line option.
#
TASKS 12
#
# Specify the maximum priority to be used by the tcpdl tasks
#
PRIORITY 2
#
# Specify the FONT and SFONTSIZE to be used
#
FONT Xen
FONTSIZE 9
#
# Specify the file suffixes not to be downloaded
#
# ignore lha archives
IGNORE .lha
# ignore zip archives
IGNORE .zip
# ignore .wav sound files
IGNORE .wav
# ignore MS-DOS executables
IGNORE .exe
THE FILE "TCPDLDIR:tcpdlpp.config":
-----------------------------------
The tcpdlpp.config file is optional. If present, the character translations
defined in it are applied to each URL processed. Note that the actual file
names are NOT changed, but only the URLs within each html file.
Each line of this file consists of a character literal to be converted
and a character literal or string that should replace it. Character
literals should be enclosed by single quotes ('), and strings should
be enclosed by double quotes (").
White spaces (spaces and tabs) are ignored unless inside a string.
Any line where the first non-whitespace character is a hash (#) is
treated as a comment and ignored.
Certain escape characters are allowed in character literals and strings:
\a = bell
\b = backspace
\f = formfeed
\n = newline
\r = carriage return
\t = horizontal tab
\v = vertical tab
\\ = backslash
\' = single quote
\" = double quote
\nnn = character with octal value nnn
\xnnn = character with hexadecimal value nnn
e.g. the following line will convert MS-DOS style backslashes into the
AmigaDOS & Unix style forward slashes:
'\\' '/'
the following line will convert tilde into the safe "%xx" equivalent:
'~' "%7E"
Each character is translated using a single rule - even if the end result
includes a character which would have been translated by some other rule.
This allows two characters to be swapped over.
TCPDLPP LISTING
---------------
The tcpdlpp program post processes the files downloaded by tcpdl. It
expects the same tcpdldir: assign as tcpdl does.
A listing is sent to stdout, which consists of 3 sections:
- a list of each file processed or skipped. Since only html files will
contain URLs to be updated, all other files are skipped. This acts
as a progress indicator.
- a list of all local files. The number of references to each is given.
If a file has no references to it, either it is a top level html file,
or it is simply not referenced. If you are building a browsable copy
of your favourite sites, you may want to delete any unreferenced files
to save disk space.
- a list of all non-local URLs. As for the local files, the number of
references made to each by the local files is given. If a particular
URL has a lot of references, you may want to download that URL too.
Each time you add or remove files from the tcpdldir:http directory, you
should re-run tcpdlpp to adjust any links that require amendment.
Files which are present, but which have had the URLs which reference
them modified in some way by the translations in the tcpdlpp.config file
will be listed as non-local unless the file name has been modified also.
If such files are listed, then the file names should be changed, and
tcpdlpp re-run to identify those files as local.
WARNING
-------
Note that unless explicitly stated to the contrary, the copyright of all
files on the WWW is held by the owners of the appropriate site.
If you intend to redistribute *any* files downloaded from the WWW please
ensure that you have the permission of the copyright holder to do so.